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Abstract 

Confidence intervals based on penalized maximum likelihood estima- 
tors such as the LASSO, adaptive LASSO, and hard-thresholding are an- 
alyzed. In the known-variance case, the finite-sample coverage properties 
of such intervals are determined and it is shown that symmetric inter- 
vals are the shortest. The length of the shortest intervals based on the 
hard-thresholding estimator is larger than the length of the shortest in- 
terval based on the adaptive LASSO, which is larger than the length of 
the shortest interval based on the LASSO, which in turn is larger than 
the standard interval based on the maximum likelihood estimator. In 
the case where the penalized estimators are tuned to possess the 'spar- 
sity property', the intervals based on these estimators are larger than 
the standard interval by an order of magnitude. Furthermore, a simple 
asymptotic confidence interval construction in the 'sparse' case, that also 
applies to the smoothly clipped absolute deviation estimator, is discussed. 
The results for the known-variance case are shown to carry over to the 
unknown-variance case in an appropriate asymptotic sense. 
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1 Introduction 



Recent years have seen an increased interest in penalized maximum likelihood 
(least squares) estimators. Prominent examples of such estimators are the 
LASSO estimator (Tibshirani (1996)) and its variants like the adaptive LASSO 
(Zou (2006)), the Bridge estimators (Frank and Friedman (1993)), or the smoothly 
clipped absolute deviation (SCAD) estimator (Fan and Li (2001)). In linear 
regression models with orthogonal regressors, the hard- and soft-thresholding 
estimators can also be reformulated as penalized least squares estimators, with 
the soft-thresholding estimator then coinciding with the LASSO estimator. 

The asymptotic distributional properties of penalized maximum likelihood 
(least squares) estimators have been studied in the literature, mostly in the con- 
text of a finite-dimensional linear regression model; see Knight and Fu (2000), 
Fan and Li (2001), and Zou (2006). Knight and Fu (2000) study the asymptotic 
distribution of Bridge estimators and, in particular, of the LASSO estimator. 
Their analysis concentrates on the case where the estimators are tuned in such 
a way as to perform conservative model selection, and their asymptotic frame- 
work allows for dependence of parameters on sample size. In contrast, Fan 
and Li (2001) for the SCAD estimator and Zou (2006) for the adaptive LASSO 
estimator concentrate on the case where the estimators are tuned to possess 
the 'sparsity' property. They show that, with such tuning, these estimators 
possess what has come to be known as the 'oracle property'. However, their 
results are based on a fixed-parameter asymptotic framework only. Potscher 
and Leeb (2009) and Potscher and Schneider (2009) study the finite-sample dis- 
tribution of the hard-thresholding, the soft-thresholding (LASSO), the SCAD, 
and the adaptive LASSO estimator under normal errors; they also obtain the 
asymptotic distributions of these estimators in a general 'moving parameter' 
asymptotic framework. The results obtained in these two papers clearly show 
that the distributions of the estimators studied are often highly non-normal and 
that the so-called 'oracle property' typically paints a misleading picture of the 
actual performance of the estimator. [In the wake of Fan and Li (2001) a con- 
siderable literature has sprung up establishing the so-called 'oracle property' for 
a variety of estimators. All these results are fixed-parameter asymptotic results 
only and can be very misleading. See Leeb and Potscher (2008) and Potscher 
(2009) for more discussion.] 

A natural question now is what all these distributional results mean for confi- 
dence intervals that are based on penalized maximum likelihood (least squares) 
estimators. This is the question we address in the present paper in the con- 
text of a normal linear regression model with orthogonal regressors. In the 
known-variance case we obtain formulae for the finite-sample infimal coverage 
probabilities of fixed-width confidence intervals based on the following estima- 
tors: hard-thresholding, LASSO (soft-thresholding), and adaptive LASSO. We 
show that among those intervals the symmetric ones are the shortest, and we 
show that hard-thresholding leads to longer intervals than the adaptive LASSO, 
which in turn leads to longer intervals than the LASSO. All these intervals are 
longer than the standard confidence interval based on the maximum likelihood 
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estimator, which is in line with Joshi (1969). In case the estimators are tuned 
to possess the 'sparsity' property, explicit asymptotic formulae for the length of 
the confidence intervals are furthermore obtained, showing that in this case the 
intervals based on the penalized maximum likelihood estimators are larger by 
an order of magnitude than the standard maximum likelihood based interval. 
This refines, for the particular estimators considered, a general result for confi- 
dence sets based on 'sparse' estimators (Potscher (2009)). Additionally, in the 
'sparsely' tuned case a simple asymptotic construction of confidence intervals 
is provided that also applies to other penalized maximum likelihood estimators 
such as the SCAD estimator. Furthermore, we show how the results for the 
known- variance case carry over to the unknown- variance case in an asymptotic 
sense. 

The plan of the paper is as follows: After introducing the model and esti- 
mators in Section 2, the known- variance case is treated in Section 3 whereas 
the unknown- variance case is dealt with in Section 4. All proofs as well as some 
technical lemmata are relegated to the Appendix. 

2 The Model and Estimators 

For a normal linear regression model with orthogonal regressors, distributional 
properties of penalized maximum likelihood (least squares) estimators with a 
separable penalty can be reduced to the case of a Gaussian location problem; 
for details see, e.g., Potscher and Schneider (2009). Since we are only interested 
in confidence sets for individual components of the parameter vector in the 
regression that are based on such estimators, we shall hence suppose that the 
data j/i, . . . ,y n are independent identically distributed as N(9,a 2 ), 9 <E M, < 
a < oo. [This entails no loss of generality in the known-variance case. In 
the unknown- variance case an explicit treatment of the orthogonal linear model 
would differ from the analysis in the present paper only in that the estimator 
a 2 defined below would be replaced by the usual residual variance estimator 
from the least-squares regression; this would have no substantial effect on the 
results.] We shall be concerned with confidence sets for 9 based on penalized 
maximum likelihood estimators such as the hard-thresholding estimator, the 
LASSO (reducing to soft-thresholding in this setting), and the adaptive LASSO 
estimator. The hard-thresholding estimator 9h is given by 

o H ■= o H (v n ) = yi(\V\ > °V n ) 

where the threshold r\ n is a positive real number, y denotes the maximum 
likelihood estimator, i.e., the arithmetic mean of the data, and a 2 = (n — 
J27=i(Vi — y) 2 - Also define the infeasible estimator 

o H ■■= o H (v n ) = yM\y\ > 

which uses the value of a. The LASSO (or soft-thresholding) estimator 9s is 
given by 

S ■= O s (Vn) = sign(y)(||/| - arj n ) + 
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and its infeasible version by 

h ■= S (Vn) = si g n (y)(lyl - °Vn) + - 

Here sign(a;) is denned as —1, 0, and 1 in case x < 0, x — 0, and x > 0, 
respectively, and z+ is shorthand for max{z, 0}. The adaptive LASSO estimator 
9a in this simple model is given by 



and its infeasible counterpart by 

K:=Un n ) = y{i-°Wjf) + = {^_Xi,y I till. 

It coincides with the nonnegative Garotte in this simple model. For the feasible 
estimators we always need to assume n > 2, whereas for the infeasible estimators 
also n — 1 is admissible. 

Note that rj n plays the role of a tuning parameter and it is most natural 
to let the estimators depend on the tuning parameter only via ar\ n and ar] n , 
respectively, in order to take account of the scale of the data. This makes the es- 
timators mentioned above scale equivariant. We shall often suppress dependence 
of the estimators on r\ n in the notation. In the following let P n ,g,a denote the 
distribution of the sample when 9 and a are the true parameters. Furthermore, 
let $ denote the standard normal cumulative distribution function. 

We also note the following obvious fact: Since hard- and soft-thresholding 
operate in a coordinatewise fashion, the results given below also apply mutatis 
mutandis to linear regressions with non-orthogonal regressors. Of course, the 
soft-thresholding estimator then no longer coincides with the LASSO estimator. 
We refrain from spelling out details. 



3 Confidence Intervals: Known- Variance Case 

In this section we consider the case where the variance a 2 is known, n > 1 
holds, and we are interested in the finite-sample coverage properties of intervals 
of the form [6 — aa n , 6 + ab n ] where a n and b n are nonnegative real numbers 
and 8 stands for any one of the estimators Oh = ^_y(?7„), 0s — 6s(ri n ), or 9a = 
OAiVn)- We shall also consider one-sided intervals (-co, 9+ac n ] and [9 — crc„, oo) 
with < c„ < oo. Let p n (6; a, r] n , a n , b n ) = P n j,<j (o £ [0 - aa n , 9 + ab n ]j 
denote the coverage probability. Due to the above-noted scale equivariance of 
the estimator 6, it is obvious that 

p n (9; a, r) n , a n , b n ) = p n (9/cr; 1, rj n , a n , b n ) 

holds, and the same is true for the one-sided intervals. In particular, it follows 
that the infimal coverage probabilities infg e Rp„(0; a, r] n , a n , b n ) do not depend 
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on a. Therefore, we shall assume without loss of generality that a = 1 for the 
remainder of this section and we shall write P n ,e for P ni e,i- 

3.1 Infimal coverage probabilities in finite samples 

We begin with soft-thresholding. Let Cs,n denote the interval [9 s — a n ,9s + 
b n ]- We first determine the infimum of the coverage probability ps,n(9) := 
PS,n(Q; 1, Vm a n, M = P n ,e (9 G Cs>) of this interval. 

Proposition 1 For every n>l, the infimal coverage probability of the interval 
Cs,n is given by 



i $(nV2(a n -T] n )) -<5(n^ 2 (-b„ . - Vn )) ifa n <b n 
vatp s ,nW - i $ ( „i/2 (6n _ ^)) _ $( n l/2(_ a „ - ^)) i/ a „ > K 



As a point of interest we note that ps,n(9) is a piecewise constant function 
with jumps at 9 = —a n and 9 = b n . 

Next we turn to hard-thresholding. Let Cu.n denote the interval [Oh — 
a n , 9H+b n }- The infimum of the coverage probability ph,u{9) :— ph,u{9\ 1, ?? n , a n , b n ) 
Pn,e (9 G Ch,u) of this interval has been obtained in Proposition 3.1 in Potscher 
(2009), which we repeat for convenience. 

Proposition 2 For every n > 1, the infimal coverage probability of the interval 
Ch,u is given by 

mip H>n (6) (2) 

$(n 1/2 (a„ - Ti n )) - $(-™ 1/2 M if Vn < a n + K and a„ < b n 
$(n 1 / 2 (6„ - T) n )) - $(-n 1/2 a„) if n n <a n + b n and a n > b n 
if r/ n > a n + b n . 

For later use we observe that the interval Cu.n has positive infimal coverage 
probability if and only if the length of the interval a n + b n is larger than rj n - 
As a point of interest we also note that the coverage probability pu,n{9) is 
discontinuous (with discontinuity points at 9 = —a n and 9 = b n ). Furthermore, 
as discussed in Potscher (2009), the infimum in ^ is attained if i] n > a n + b n> 
but not in case r\ n < a n + b n . 

Finally, we consider the adaptive LASSO. Let Ca,u denote the interval [9a — 
a.n,9A+b n ]- The infimum of the coverage probability pA,n(9) := pA,n(0;l,r] n ,a n ,b n ) - 
P n ,e (9 £ Ca,u) of this interval is given next. 

Proposition 3 For every n > 1, the infimal coverage probability of Ca,u is 
given by 



inf p Ain (0) = <f(n^ 2 (a n - Vn ))-^ (n 1 ' 2 ({a n - b n )/2 - ^/((a n + 6„)/2) 2 + V l)) 
if a n < b n , and by 

inf p A ,n(0) = $(n 1/2 (&„-r]„))-$ in 1 ' 2 ((b n - a„)/2 - V(K + M/2) 2 + vt)) 
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if a n > b„ . 

We note that pa,u is continuous except at 9 = b n and 9 = —a n and that the 
infimum of pa,u is not attained which can be seen from a simple refinement of 
the proof of Proposition |3l 

Remark 4 (i) If we consider the open interval Cg n = (9s — a n ,9s + b n ) the 
formula for the coverage probability becomes 

P n .e {9 E C° s , n ) = [$(n 1 / 2 (a„-r,„))~<i>^ 1 / 2 (-6 n -7 ?n ))]l(0<- an ) 

+ [$(n 1 / 2 (a„ + r) n )) - $(nV 2 (-6 n - Vn ))]l(~a n < 9 < b n ) 
+ [$(nV 2 ( a „ + „J) - Hn 1/2 {-b n + r, n ))]l(b n < 9). 

As a consequence, the infimal coverage probability of Cg n is again given by (Op. 
A fortiori, the half-open intervals (6 n — a n , 9 n + b n ] and [9 n — a n , 9 n + b n ) then 
also have infimal coverage probability given by (QJ). 

(ii) For the open interval C H n = (Oh — a n , 9h + b n ) the coverage probability 
satisfies 

Pn,6 {0 G C° H n ) = P n ,g (9 G C H ,n) 

- [1(9 = b n ) + 1(9 = -a„)][<i>(n 1/2 (-0 + Vn)) - Hn 1/2 (~e - r, n ))]. 

Inspection of the proof of Proposition 3.1 in Potscher (2009) then shows that 
C H n has the same infimal coverage probability as Ch,u ■ However, now the 
infimum is always a minimum. Furthermore, the half- open intervals (9r — 
«n, 0H+b n ] and [Oh— dm On+b n ) then a fortiori have infimal coverage probability 
given by |]J); for these intervals the infimum is attained if r\ n > a n + b n , but not 
necessarily if r\ n < a n + b n . 

(Hi) If C° A n denotes the open interval (9a — a n ,9A + b n ), the formula for the 
coverage probability becomes 

Pn,e (0 G C° A . n ) = 

$ ( n V2 7 (-) (e, -an)) ~ $ (n 1 ' 2 ^ (0, bn)) if < -a„ 
$( n i/2 7 (+)(^_ aji) ) _ $ ( n i/2 7 (-) ( ^ 6n) ) if -a n<e< b n 

$ („i/2 7 (+)(e, -a n )) - $ (n 1 /2 7 (+)( 0; bn) ) if > bn> 

where 'y( — ) andj 1 --^ are defined in \17) and U8\) in the Appendix. Again the cov- 
erage probability is continuous except at 9 = b n and 9 = —a n (and is continuous 
everywhere in the trivial case a n = b n = 0). It is now easy to see that the infi- 
mal coverage probability of C A n coincides with the infimal coverage probability 
of the closed interval Ca,u, the infimum of the coverage probability of C° A n now 

always being a minimum. Furthermore, the half-open intervals (9a~ a„, 9A + b n ] 
and [9a — a n ,9 a + b n ) a fortiori have the same infimal coverage probability as 
C A , n and C° A n . 
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(iv) The one-sided intervals (—00, Os+Cn], (— 00, Os + c n ), [9s — c ni oc), (0$ — 
c n ,oo), (-00, 9 H + c n ], (-00, 6 H + Cn), [Oh ~c n , 00), (6 H - c„, 00), (—00,^ + 
c n ], (—00, 9a + c n ), (9a — c n ,oo), and [6a — c n ,oo), with c n a nonnegative 
real number, have infimal coverage probability $(n 1/,2 (c„ — r/ n )). This is easy 
to see for soft-thresholding, follows from the reasoning in Potscher (2009) for 
hard-thresholding, and for the adaptive LASSO follows by similar, but simpler, 
reasoning as in the proof of Proposition^ 

3.2 Symmetric intervals are shortest 

For the two-sided confidence sets considered above, we next show that given 
a prescribed infimal coverage probability the symmetric intervals are shortest. 
We then show that these shortest intervals are longer than the standard interval 
based on the maximum likelihood estimator and quantify the excess length of 
these intervals. 

Theorem 5 For every n > 1 and every 6 satisfying < 5 < 1 we have: 

(a) Among all intervals Cs.n with infimal coverage probability not less than 
S there is a unique shortest interval Cg n = [0$ — a* g, #s + &* 5] characterized 
by a n s = Ki s w ith a* s being the unique solution of 

^(n^ian - r) n )) - ^{n^i-an - n n )) = 6. (3) 

The interval Cg n has infimal coverage probability equal to S and a* s is positive. 

(b) Among all intervals Ch,u with infimal coverage probability not less than 

5 there is a unique shortest interval C H n = [9h — a* t H , 9h + °n h\ characterized 
^ a n h ~ Ki h w tth a n h being the unique solution of 

$(n 1 / 2 (a„ - n n )) - ^~n^ 2 a n ) = 5. (4) 
The interval C* H n has infimal coverage probability equal to 8 and a* H satisfies 

a *n,H > Vn/ 2 - 

(c) Among all intervals C A,n with infimal coverage probability not less than 

6 there is a unique shortest interval C* A n — [9 a ~ a n a>@a + b„ a\ characterized 
by a* A = b* n a with a* A being the unique solution of 

Hn l,2 {a n - n n )) - $ (-n^^T^) = S. (5) 
The interval C* A n has infimal coverage probability equal to 6 and a* A is positive. 

In the statistically uninteresting case 6 = the interval with a n = b n = 
is the unique shortest interval in all three cases. However, for the case of the 
hard-thresholding estimator also any interval with a n = b n and a n < rj n /2 has 
infimal coverage probability equal to zero. 

Given that the distributions of the estimation errors 9s — 9, 9r — 0, and 
9a — 9 arc not symmetric (see Potscher and Leeb (2009), Potscher and Schnei- 
der (2009)), it may seem surprising at first glance that the shortest confidence 
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intervals are symmetric. Some intuition for this phenomenon can be gained on 
the grounds that the distributions of the estimation errors under 9 — r and 
9 = —t are mirror-images of one another. 

The above theorem shows that given a prespecified 6 (0 < S < 1), the 
shortest confidence set with infimal coverage probability equal to S based on 
the soft-thresholding (LASSO) estimator is shorter than the corresponding in- 
terval based on the adaptive LASSO estimator, which in turn is shorter than 
the corresponding interval based on the hard-thresholding estimator. All three 
intervals are longer than the corresponding standard confidence interval based 
on the maximum likelihood estimator. That is, 

<h > <,a > <,s > n- x '^- x {{l + S)/2). 

Figure 1 below shows n 1 / 2 times the half-length of the shortest <5-level confidence 
intervals based on hard-thresholding, adaptive LASSO, soft-thresholding, and 
the maximum likelihood estimator, respectively, as a function of n 1 ^ 2 rj n for 
various values of 5. The graphs illustrate that the intervals based on hard- 
thresholding, adaptive LASSO, and soft-thresholding substantially exceed the 
length of the maximum likelihood based interval except if n 1 ^ 2 T] n is very small. 
For large values of n 1 / 2, q n the graphs suggest a linear increase in the length of 
the intervals based on the penalized estimators. This is formally confirmed in 
Section 13.2.11 below. 





Figure 1: n 1 / 2 a* n 1 / 2 a* A , n x l 2 a* n s as a function of n 1 ^ 2 rj n for coverage 
probabilities S = 0.5, 0.8, 0.9, 0.95. The horizontal line at height $- 1 ((l + 5)/2) 
indicates n 1 / 2 times the half-length of the standard maximum likelihood based 
interval. 
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3.2.1 Asymptotic behavior of the length 

It is well-known that as n — > oo two different regimes for the tuning parameter 
rj n can be distinguished. In the first regime rj n — » and n x l 2 ri n — » e, < e < oo. 
This choice of tuning parameter leads to estimators 9s, 9h, and 9 a that perform 
conservative model selection. In the second regime r\ n — > and n 1 / 2 "q n — > oo, 
leading to estimators 9s, 9h, and 9 a that perform consistent model selection 
(also known as the 'sparsity property'); that is, with probability approaching 1, 
the estimators are exactly zero if the true value 9 = 0, and they are different 
from zero if 9 ^ 0. Sec Potscher and Leeb (2009) and Potscher and Schneider 
(2009) for a detailed discussion. We now discuss the asymptotic behavior, under 
the two regimes, of the half-length a* s , a* H , and a* A of the shortest intervals 
C Sn , C Hn , and C* A n with a fixed infimal coverage probability 5, < 6 < 1. 

If r\ n — > and n 1 / 2 7y n — >• e, < e < 00, then it follows immediately from The- 
orem[5]that n 1 / 2 a* g , n 1 / 2 a* H , and n 1 ^ 2 a* A converge to the unique solutions 
of 

$(a - e) - $(-a - e) = S, (6) 
$(a - e) - $(-0) = 5, (7) 

and 

$ ( Va 2 + e 2 J - $(-a + e) = 6, (8) 

respectively. [Actually, this is even true if e = 0.] Hence, while a* H , a^ A , 
and a* s are larger than the half-length n _1 / 2 $ _1 ((l + S)/2) of the standard 
interval, they are of the same order rT x l 2 . 

The situation is different, however, if rj n — > but n 1 / 2 rj n — > 00. In this case 
Theorem [S] shows that 

^(n 1 /^ _„„))-> a 
since n 1 / 2 (— a* s — ?7 n ) < — n 1//2 r? n — > — cxd. In other words, 

a; iS = 7 ? „ + n- 1 / 2 $- 1 (<5) + (n- 1 / 2 ). (9) 
Similarly, noting that n 1 / 2 a* ^ > n 1 / 2 r] n /2 — > 00, we get 

a* n>H = Vn + n- 1 ^- 1 (8) + o(n- 1 / 2 ); (10) 
and since n 1 / 2 -J 'a 2 + ?7 2 > n 1 l 2 j] n — > 00 we obtain 

<, A = 7 ?n + n- 1 /^-i(a ) + ( n -i/2 ) . (11) 

[Actually, the condition ^> has not been used in the derivation of ([9])- 
(fTTj).] Hence, the intervals Cg n , C^ n , and C* An are asymptotically of the 
same length. They are also longer than the standard interval by an order of 
magnitude: the ratio of each of a* s (a* H , a* A , respectively) to the half-length 
of the standard interval is n x l 2 r\ n , which diverges to infinity. Hence, when the 
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estimators 9s, 9h, and 9a are tuned to possess the 'sparsity property', the 
corresponding confidence sets become very large. For the particular intervals 
considered here this is a refinement of a general result in Pdtscher (2009) for 
confidence sets based on arbitrary estimators possessing the 'sparsity property'. 
[We note that the sparsely tuned hard-thresholding estimator or the sparsely 
tuned adaptive LASSO (under an additional condition on rj n ) are known to 
possess the so-called 'oracle property'. In light of the 'oracle property' it is 
sometimes argued in the literature that valid confidence intervals based on these 
estimators with length proportional to n -1 / 2 can be obtained. However, in 
light of the above discussion such intervals necessarily have infimal coverage 
probability that converges to zero and thus are not valid. This once more 
shows that fixed-parameter asymptotic results like the 'oracle' property can be 
dangerously misleading.] 

3.3 A simple asymptotic confidence interval 

The results for the finite-sample confidence intervals given in Section 13.11 re- 
quired a detailed case by case analysis based on the finite-sample distribution 
of the estimator on which the interval is based. If the estimators 9s, Oh, and 
9a are tuned to possess the 'sparsity property', i.e., if the tuning parameter 
satisfies rj n — > and n 1 ' 2 r] n — > oo, a simple asymptotic confidence interval con- 
struction relying on asymptotic results obtained in Pdtscher and Leeb (2009) 
and Potscher and Schneider (2009) is possible as shown below. An advantage of 
this construction is that it easily extends to other estimators like the smoothly 
clipped absolute deviation (SCAD) estimator when tuned to possess the 'spar- 
sity property'. 

As shown in Potscher and Leeb (2009) and Potscher and Schneider (2009), 
the uniform rate of consistency of the 'sparsely' tuned estimators 9s, Oh, and 
9a is not n 1 / 2 , but only ry" 1 ; furthermore, the limiting distributions of these 
estimators under the appropriate 77" -scaling and under a moving-parameter 
asymptotic framework are always concentrated on the interval [— 1 , 1] . These 
facts can be used to obtain the following result. 

Proposition 6 Suppose rj n — > and n 1 ' 2 r] n — > oo. Let 9 stand for any of the 
estimators 9s{rj n ), 9H(v n ), or @A(Vn)- Let d be a real number, and define the 
interval D n = [9 — drj n , 9 + drj n ]. If d> 1, the interval D n has infimal coverage 
probability converging to 1, i.e., 

lim infP M (flGU n ) = l. 

n->oo 0eR 

Ifd<l, 

lim inf P nfi {6 G D n ) = 0. 

ti->oo <9GR 

The asymptotic distributional results in the above proposition do not provide 
information on the case d = 1. However, from the finite-sample results in Section 
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I3.1l we see that in this case the infimal coverage probability of D n converges to 

1/2. 

Since the interval D n for d > 1 has asymptotic infimal coverage probability 
equal to one, one may wonder how much cruder this interval is compared to the 
finite-sample intervals C s n , Cjj n , and C* An constructed in Section ET21 which 
have infimal coverage probability equal to a prespecified level 8, < 8 < 1: The 
ratio of the half-length of D n to the half-length of the corresponding interval 
Cs, n > C Htn , and C* A n is 

d(l + 0(n- 1 / 2 r 1 - 1 ))=d(l + o(l)) 

as can be seen from equations ©, (ITU1) . and (TTT1) . Since d can be chosen arbi- 
trarily close to one, this ratio can be made arbitrarily close to one. This may 
sound somewhat strange, since we are comparing an interval with asymptotic in- 
fimal coverage probability 1 with the shortest finite-sample confidence intervals 
that have a fixed infimal coverage probability S less than 1. The reason for this 
phenomenon is that, in the relevant moving-parameter asymptotic framework, 
the distribution of 9 — 9 is made up of a bias-component which in the worst 
case is of the order r\ n and a random component which is of the order n -1 / 2 . 
Since rj n — > and n 1 / 2 rj n — > oo, the deterministic bias-component dominates 
the random component. This can also be gleaned from equations (fTU)) , and 
(fTTj) . where the level 5 enters the formula for the half-length only in the lower 
order term. 

We note that using Theorem 19 in Potscher and Leeb (2009) the same proof 
immediately shows that Proposition [5] also holds for the smoothly clipped abso- 
lute deviation (SCAD) estimator when tuned to possess the 'sparsity property'. 
In fact, the argument in the proof of the above proposition can be applied to 
a large class of post-model-selection estimators based on a consistent model 
selection procedure. 

Remark 7 (i) Suppose D' n = [9 — d\n nl 9 + e?2?7 r J where 9 stands for any of 
the estimators 9s, 9h, or 9a- Ifwn.(di, d%) > 1, then the limit of the infimal 
coverage probability of D' n is 1; if max(<ii, c^) < 1 then this limit is zero. This 
follows immediately from an inspection of the proof of Proposition 

(ii) Proposition^ also remains correct if D n is replaced by the corresponding 
open interval. A similar comment applies to the open version of D' n . 

4 Confidence Intervals: Unknown- Variance Case 

In this section we consider the case where the variance a 2 is unknown, n > 
2, and we are interested in the coverage properties of intervals of the form 
[9 — aa n , 9 + aa n ] where a n is a nonnegative real number and 9 stands for any 
one of the estimators 9r = #ff(?7n), &s = OsiVn)' 01 ®A — ^A(v n )- F° r brevity 
we only consider symmetric intervals. A similar argument as in the known- 
variance case shows that we can assume without loss of generality that a = 1, 
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and we shall do so in the sequel; in particular, this argument shows that the 
infimum with respect to 9 of the coverage probability does not depend on a. 



where a n is a nonnegative 



4.1 Soft-thresholding 

Consider the interval E$, n — 
real number and 9s = 9s(v n )' We then have 

P n ,e (9 e E s .n) = / Pn.e {9 e E s .n \a = s)h n (s)ds 
Jo 

where h n is the density of <r, i.e., h n is the density of the square root of a chi- 
square distributed random variable with n — 1 degrees of freedom divided by the 
degrees of freedom. In view of independence of a and y we obtain the following 
representation of the finite-sample coverage probability 



Pn.fi {9 G E S , n ) 



Pn.e 9 € 



9s{siln) - sa n, 9s{siln) + sa n ) h n (s)ds 



PS,n 

(9; 1, ST] n , sa n , sa n ) h n (s)ds 



(12) 



where ps, n is given in ([15)) in the Appendix. 

We next determine the infimal coverage probability of Es jU in finite samples: 
It follows from (|15[) . the dominated convergence theorem, and symmetry of the 
standard normal distribution that 



/•OO 

inf P nfi (9 £ E s , n ) < lim / p s . n (6;l, sr] n , sa n , sa n ) h n {s)ds 

lim p s ,n (9; 1, srj n , sa n ,sa„) h n (s)ds 



[$(n 1/2 s(a„ - 77J) - $(n 1/2 s(-a„ - rj n ))]h n (s)ds 
= Tn-i(n^ 2 (an - Vn)) - T n -i(n^ 2 (-an - T7J), (13) 

where T„_i is the cdf of a Student ^distribution with n — 1 degrees of freedom. 
Furthermore, ([TJ shows that 

ps, n (9; 1, st] n , sa ni sa„) > $(n 1/2 s(a„ - T) n )) - $(n 1/2 s(-a„ - r) n )) 

holds and whence we obtain from ([12")) and (fT3")) the following expression for the 
infimal coverage probability of Es, n '- 

inf Pn,e {9 e E s , n ) = Tn-i{ n 1/2 (a n - Vn)) - T„-i(n 1/2 (-a„ - Vn)) (14) 

e^M. 

for every n > 2. Remark [4] shows that the same relation is true for the corre- 
sponding open and half-open intervals. 
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Relation (|T4l shows the following: suppose 1/2 < S < 1 and a* s solves ([3]), 
i.e., the corresponding interval C s n has infimal coverage probability equal to S. 
Let <z** g be the (unique) solution to 

T„_i(n 1/2 (a„ - 77J) - T„_i(n 1/2 (-a„ - 77J) = 5, 



*s - a-a^g, 6 S + <ta n * s 



has infimal 



i.e., the corresponding interval E s * n - 

coverage probability equal to S. Then a**g > a* § holds in view of Lemma [TH 
in the Appendix. I.e., given the same infimal coverage probability S > 1/2, the 
expected length of the interval E* s * n based on 6 s is not smaller than the length 

of the interval C s n based on 6s- 

Since ||$ - T„_i = sup xeR |$(x) - T n _i(x)\ -> for n oo holds by 
Polya's theorem, the following result is an immediate consequence of (fT4")l . 
Proposition [TJ and Remark 2J 

Theorem 8 For every sequence a n of nonnegative real numbers we have with 



Es,n — 



h - cra n , 6 S + cra n 



and Cs n = 



— a n, 6 s + a n 



that 



inf P n . e {6 e E s ,n) - inf P n ,6 (0 6 C s , n ) -> 



tts 71 — y oo . The analogous results hold for the corresponding open and half-open 
intervals. 

We discuss this theorem together with the parallel results for hard-thresholding 
and adaptive LASSO based intervals in Section WM 



4.2 Hard-thresholding 



Consider the interval Eh n 



6 H - aa n , 6h + cra r . 



where a n is a nonnegative 



real number and 6h — OiiiVn)- We then have analogously as in the preceding 
subsection that 



P„ 



£ E. 



H.. 



(6; 1, sr) n ,sa n , sa n ) h n (s)ds. 



Note that pn,n (0; 1, sr) n , sa n , sa n ) is symmetric in 6 and for 6 > is given by 
(see Potscher' (2009)) 



Ph 



1 1) sr) n ,sa n , sa n ) 
{$(n 1 / 2 (- (? + Sf ln)) - *(« 1/2 (-6» - sT] n ))\ 1 (0 < < sa n ) 
+ max 0, 3>(n 1/2 sa„) - $(n 1/2 (-6» + sry„)) 1 (sa n < 6 < srj n + sa n ) 
+ j$(n 1/2 sa„) - $(-n 1/2 sa„) j 1 (sr] n + sa n < 6) 
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in case r\ ri > 2a n , by 

PH.n (0; 1) Sf ln> s «n, Sa n ) 

= ^(n^ 2 (-9 + sr) n ) - ^^{-e - s?7 n ))} 1 (0 < 9 < srj n - sa n ) 
+ |$(n 1 / 2 sa„) - ${n l l 2 {-9 - sr) n ))\ 1 (st?„ - sa n < < sa n ) 
+ |$(n 1/2 sa„) - $(rj 1/2 (-6> + S7?„))| 1 (sa„ < 9 < sr) n + sa n ) 
+ |$(n 1/2 sa„) - $(-n 1/2 sa„)| 1 (sry„ + sa„ < 0) 
if a n <n n < 2a„, and by 

PH.n 

(9; 1, sri n ,sa n , sa n ) 

= {^(n 1 / 2 ^) - $(-n 1 / 2 sa„)} {1 (0 < < sa„ - sr,J + 1 (sr?„ + sa„ < 9)} 

+ |$(n 1/2 sa n ) - $(n 1/2 (-# - sr? n ))} 1 (sa„ - sn n <9< sa n ) 

+ ^(n^sa^ - $(n 1 / 2 (-6» + s^))} 1 (sa„ < 6> < s?7n + sa„) 

if i] n < a n . In the subsequent theorems we consider only the case where rj n — >• 
as this is the only interesting case from an asymptotic perspective: note that 
any of the penalized maximum likelihood estimators considered in this paper is 
inconsistent for 9 if r\ n does not converge to zero. 

Theorem 9 Suppose n n — >■ 0. For every sequence a n of nonnegative real num- 



bers we have with Eh,u 
that 

inf P n 

eeR 



G E H ,n) ~ inf P n ,6 



and Ch.-, 



S C H .n) -»■ o 



as ri — s> oo. TTie analogous results hold for the corresponding open and half-open 
intervals. 



4.3 Adaptive LASSO 

Consider the interval Ea,u = [9a — &a ni 9A + <Ja n ] where a n is a nonnegative 
real number and 9a = 9A(Vn)- We then have analogously as in the preceding 
subsections that 

/>oo 

Pn,e(9 (£ E A ,n) = / (9; 1, sn n ,sa n , sa n )h n (s)ds 

Jo 

where pa,u is given in (|16[) in the Appendix. 

Theorem 10 Suppose n n — > 0. For every sequence a n of nonnegative real num- 



bers we have with Ea,\ 
that 



9a - cram 9a + oa r . 



and Ca. 



9a - a n ,9 A + a r , 



inf P nfi (9 G E A .n) - inf P n , e (9 e C A<n ) 
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as n — > oo. The analogous results hold for the corresponding open and half-open 
intervals. 

4.4 Discussion 

Theorems [HI and [TU] show that the results in Section [3] carry over to the 
unknown- variance case in an asymptotic sense: For example, suppose < 5 < 1, 
and a n ,s {a n ,H, On,A, respectively) is such that Es, n {Eh,u, Ea.u, respectively) 
has infimal coverage probability converging to 6. Then, for a regime where 
nl ^ 2r ln ~* e with < e < oo, it follows that n^a^s, n x l 2 a n% Hi and n 1 / 2 ^.^ 
have limits that solve ([6])-([8]), respectively; that is, they have the same limits 
as n x l 2 a* n s , n x l 2 a* nHl and n 1 / 2 a* A , which are n 1 / 2 times the half-length of 
the shortest 5-confidence intervals C£ n , C* H n , and C An , respectively, in the 
known-variance case. Furthermore, for a regime where n 1 ^ 2 rj n — > oo it follows 
that a n> s, a nj H, and a n a satisfy (|5j)- (fTTj) , respectively (where we also assume 
?7„ — > for hard-thresholding and the adaptive LASSO). Hence, a 7h s, a 7h H, and 
a n ^A on the one hand, and a* s , a* H , and a* A on the other hand have again 
the same asymptotic behavior. Furthermore, Theorems HI [9j and [TOl show that 
Proposition [6] immediately carries over to the unknown- variance case. 

A Appendix 

Proof of Proposition 2} Using the expression for the finite sample distri- 
bution of n x / 2 (9s — 9) given in Potscher and Leeb (2009) and noting that this 
distribution function has a jump at —n 1 l 2 9 we obtain 

p s ,n{6) = [<f(n 1 / 2 ( a „- ?? J)-ci>(ni/2 ( _ 6n _ ?7n)) ] 1(0< _ fln) 

+ Mn^ 2 (a n + r, n )) - ^(n 1 ' 2 ^ - r, n ))]l(-a„ < 9 < b n ) 

+ MnV2(a n + n n )) - $ (n 1 ' 2 ^ + ^))]1(&„ < 9). (15) 

It follows that infe 6 RP5 j „(0) is as given in the proposition. ■ 

Proof of Proposition^ The distribution function FA, n .e{x) — P n ${n?~l 2 (9 a~ 
9) < x) of the adaptive LASSO estimator is given by 

l(x + n 1/2 9 > 0)$ (~{{n 1/2 9 - x)/2) + y \{n 1 / 2 9 + x)/2) 2 + n V 2 }j + 

l(x + n 1 ' 2 9 < 0)$ [-{{n 1 / 2 9 - x)/2) - ^((n 1 / 2 9 + x)/2) 2 + 

(see Potscher and Schneider (2009)). Hence, the coverage probability pA,n(9) = 
FA,n.e(n 1/2 a n ) - Hm x ^(_ n i/2 bn) _ F a ,»m(x) is 

PaA°) = { ^(n 1 /2 7 (+)(^, 
[ $(n 1 /2 7 (+)(0 ; 



-a n )) - $ (n 1 /2 7 (-) bn )) if e < _ fln 

-a n ))-^ (n l / 2 1 ^\9,b n )) if - a n < 9 < b n 
-a n )) -$(n 1 /2 7 (+)(^,6„)) \f6>b n . 

(16) 
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Here 



(17) 

7 (+) (M) = -((e + ^)/2) + V((^-^)/2) 2 + e (18) 

which are clearly smooth functions of (0,x). Observe that 7^' and 7'+) are 
nonincreasing in 9 € M. (for every x 6 M). As a consequence, we obtain for 
—a n <9<b n the lower bound 

PA, n (9) > $(n 1/2 7 (+) (&n,-a„)) - * (n 1/2 7 (_) (-an, M) 

= $ (n 1 / 2 ((a„ - 6„)/2 + ^(K + bn)/2) 2 + Wr) ) 

-$ (n 1 ' 2 ((a„ - 6„)/2 - ^{(a n + 6„)/2) 2 + ^) ) . (19) 

Consider first the case where a n < b n . We then show that pa,u(9) is nonin- 
creasing on (— 00, —a n ): The derivative dpA,n(&)/d9 is given by 

dp A<n {9)/d9 = 

n l ' 2 [cj>{n l ' 2 ^-\9, ~a n ))d^-\6, -a n )/d9 - 0(n 1 / 2 7 (-)(0, b n ))d^(9, b n )/d9] 

where <fi denotes the standard normal density function. Using the relation a n < 
6 n , elementary calculations show that 

d^-\9,-a n )/d6 < d^\9,b n )/d9 for 9 6 (-00, -a„). 

Furthermore, given a„ < 6 n , it is not too difficult to see that 7*-'(0, — a n )\ < 



b n )\ for G (— 00, — a„) (cf. Lemma [Til below) . which implies that 



The last two displays together with the fact that d^~\9, —a n )/d9 as well as 
d^~\9,b n )/d9 are less than or equal to zero, imply that dpA, n {9) / d9 < on 
(— 00, — a n ). This proves that 



inf pA.n{0) = lim p A ,n(8) = c 

9< — a„ 0— >( — a„)_ 



with 



c = $ 



(20) 



Since the lower bound given in (|19[) is not less than c, we have 
inf PA,n(0) = inf PA,n(0) = c. 

6<b n 6< — a n 

It remains to show that pa,u(&) > c for 9 > b n . From (TT51) and (|2U)) after 
rearranging terms we obtain for 9 > 6„ 



PA,n(0) - c 



$(n 1 /2 7 (+) (0j _ fln )) _ $(n 1 /2 7 (-)(_ an; _ an )) 
$(n 1 / 2 7 <+)(0, &„)) - $(n 1 / 2 7 ^)(-a„, b n )) 
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It is elementary to show that ^ + \0, —a n )) > 7^ '(— a n , ~a n ) = a n — rj n and 
7 (+) (6»,fe„)) > 7 ( ~ ) (-a„,&„). We next show that 

7 ( +) (0, -a,,) - 7 { ~ ] (-a n , -a*)) > l (+) {e,b n ) - ^~\-a n ,b n ). (21) 

To establish this note that (|21l) can equivalently be rewritten as 

/(0) + f((9 + a n )/2) > f((9 - b n )/2) + f((a n + b n )/2) (22) 

where f(x) = [x 2 + rf n ) 1 / 2 . Observe that < (9-b n )/2 < {6 + a n )/2 holds since 
< On < b n < 9. Writing (9 - b n )/2 as A(0 + a„)/2 + (1 - A)0 with < A < 1 
gives (<z n + b n )/2 = (1 — X){9 + a n )/2 + AO. Because / is convex, the inequality 
(|2"2")1 and hence ((5TJ) follows. 

Next observe that in case a n > T] n we have (using monotonicity of 7^ (9, b n )) 

< 7 (_) (-a«,-a„)) = a n -r] n < b n -r) n = - 7 <+) (b n ,b n ) < -j^(9,b n ) (23) 

for 9 > b n . In case a n < r\ n we have (using j^(9,x) — j^~\x,6) and mono- 
tonicity of 7 ( — ) in its first argument) 

7 ( ~ ) (-a„,6„) < 7 ( ~ ) (-a„, -a„) = a„ - r\ n < 0, (24) 

and (using monotonicity of 7 ' + -*) 

l { - ] {~a n ,b n ) < ~i (+) {b n ,-a n ) < - 7 (+)(0,- a „) (25) 

for 9 > b n . Applying Lemma [T2l below with a = n 1 / 2 ^^ {—a n , —a n ), j3 = 
n 1 /a 7 (+)(fl j _ OB ) j 7 = n 1 /2 7 (-)(_ aiii 6 n ) j an d 5 = n 1 / 2 7 (+)(0, 6„) and using 
(|2"Tj) - ([2l))) . establishes pa,u(9) — c > 0. This completes the proof in case a n < b n . 

The case a n > b n follows from the observation that (|16[) remains unchanged 
if a n and b n are interchanged and 9 is replaced by —9. ■ 

Lemma 11 Suppose a n < b n . Then \^-\9,~a n )\ < \-y^(9,b n )\ holds for 
9 e (-00, -a„). 

Proof. Squaring both sides of the claimed inequality shows that the claim is 
equivalent to 

a 2 n /2 - (a n - 9)^/((a n + 9)/2) 2 + V 2 < b 2 j2 + (b n + 9) ^ {{b n - 9)/2) 2 + V 2 . 

But, for 9 < —a n , the left-hand side of the preceding display is not larger than 

a 2 n /2 + (a n + 9)^/((a n -9)/2) 2 + 7 1 2 . 

Since a 2 J 2 < b n /2, it hence suffices to show that 

-K + 9)^{(a n -9)/2) 2 + i 1 2 > -{b n + 9)^((b n -9)/2) 2 + V 2 

for 9 < —a n . This is immediately seen by distinguishing the cases where — b n < 
9 < —a n and where 9 < —b n , and observing that a n < b n . ■ 

The following lemma is elementary to prove. 
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Lemma 12 Suppose a, ft, 7, and 8 are real numbers satisfying a < ft, 7 < 5, 
and ft — a > 5 — 7. If < a < —§, or if 7 < a < and 7 < —ft, then 
<$>(ft) - $(a) > - $(7). 

Proof of Theorem [5} (a) Since S is positive, any solution to ([3]) has to 
be positive. Now the equation ([3]) has a unique solution a* s , since ([3]) as a 
function of a n s [0, 00) is easily seen to be strictly increasing with range [0, 1). 
Furthermore, the infimal coverage probability (J]} is a continuous function of 
the pair (a n , b n ) on [0, 00) x [0, 00). Let K C [0, 00) x [0, 00) consist of all pairs 
(o„,6 n ) such that (i) the corresponding interval [8 s — a n ,8s + b n ] has infimal 
coverage probability not less than 8, and (ii) the length a n + b n is less than or 
equal 2a* s . Then K is compact. It is also nonempty as the pair (a* 5, a* 5) 
belongs to K. Since the length a„ + b n is obviously continuous, it follows that 
there is a pair (a°,o°) 6 K having minimal length within K. Since confidence 
sets corresponding to pairs not belonging to K always have length larger than 
2a* g, the pair (a° , 6° ) gives rise to an interval with shortest length within the 
set of all intervals with infimal coverage probability not less than 8. We next 
show that a° = 6° must hold: Suppose not, then we may assume without loss 
of generality that a° < b° n , since ([1]) remains invariant under permutation of a° n 
and b° n . But now increasing a° by e > and decreasing 6° by the same amount 
such that a° + e < b° n — e holds, will result in an interval of the same length 
with infimal coverage probability 

*(„V* (o o + £ - „J) _ $(„Va(_( 6 o - e ) - ?,„)). 

This infimal coverage probability will be strictly larger than 

$(nVa( a » - Vn )) - ®{n l ' 2 {-b° n Vn )) > 5 

provided e is chosen sufficiently small. But then, by continuity of the infimal 
coverage probability as a function of a n and b n , the interval [83 — — £, 83 + 
b' n — e] with e < b' n < 6° will still have infimal coverage probability not less than 
5 as long as b' n is sufficiently close to 6°; at the same time this interval will be 
shorter than the interval [9s — a°, &s + 0°]. This leads to a contradiction and 
establishes a° = 6° . By what was said at the beginning of the proof, it is now 
obvious that a° = 6° = a* s must hold, thus also establishing uniqueness. The 
last claim is obvious in view of the construction of a* s . 

(b) Since S is positive, any solution to Q has to be larger than rj n /2. Now 
equation ((4]) has a unique solution a* H , since ([4]) as a function of a n G [r] n /2, 00) 
is easily seen to be strictly increasing with range [0, 1). Furthermore, define K 
similarly as in the proof of part (a). Then, by the same reasoning as in (a), 
the set K is compact and non-empty, leading to a pair (a°,6°) that gives rise 
to an interval with shortest length within the set of all intervals with infimal 
coverage probability not less than 8. We next show that a° = 6° must hold: 
Suppose not, then we may again assume without loss of generality that a° < 6° . 
Note that a° + 6° > r\ n must hold, since the infimal coverage probability of 
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the corresponding interval is positive by construction. Since all this entails 
\ a n ~ Vn\ < ^ni increasing a° by e > and decreasing b° n by the same amount 
such that a° + e < 6° — e holds, will result in an interval of the same length 
with infimal coverage probability 

Hn^{a° n + e - „J) - $(-^^(6° - £ )) > 

provided e is chosen sufficiently small. By continuity of the infimal coverage 
probability as a function of a n and b ni the interval [9 s — a° — e,9s + b' n — e] 
with e < b' n < 6° will still have infimal coverage probability not less than 6 
as long as b' n is sufficiently close to 6°; at the same time this interval will be 
shorter than the interval [9 s — a^,#s + b^], leading to a contradiction thus 
establishing a° n = b° n . As in (a) it now follows that o° = 6° = a* H must hold, 
thus also establishing uniqueness. The last claim is then obvious in view of the 
construction of a* H . 

(c) Since S is positive, it is easy to see that any solution to JS|) has to be 
positive. Now equation (JSJ has a unique solution a* A , since ([5]) as a function 
of a n s [0, oo) is strictly increasing with range [0, 1). Furthermore, the infimal 
coverage probability as given in Proposition[3]is a continuous function of the pair 
(a n , b n ) on [0, oo) x [0, oo). Define K similarly as in the proof of part (a). Then 
by the same reasoning as in (a), the set K is compact and non-empty, leading 
to a pair (a°,6°) that gives rise to an interval with shortest length within the 
set of all intervals with infimal coverage probability not less than 8. We next 
show that a° = 6° must hold: Suppose not, then we may again assume without 
loss of generality that a° < 6° . But now increasing a° n by e > and decreasing 
b° n by the same amount such that a° n + e < b° n — E holds, will result in an interval 
of the same length with infimal coverage probability 

$ (n l/2 (a o + £ _ ^ _ $ ( n l/2 ( £ + {< _ K)/2 _ v / (K+6 o )/2)2+r? 2 

Hn 1/2 {a° n „J) - $ (V/ 2 (« - b° n )/2 - V(« + b°)/2y + V i)) > 5. 
provided e is chosen sufficiently small. This is so since a° < 6° implies 



K- V n \< «-6°)/2- + W) 2 +r? 2 

as is easily seen. But then, by continuity of the infimal coverage probability as a 
function of a n and b n , the interval [9s — a° — e, 9s + b' n — e] with e < b' n < b° n will 
still have infimal coverage probability not less than 8 as long as b' n is sufficiently 
close to 6°; at the same time this interval will be shorter than the interval 
[9s — a° , 9s + &°J ■ This leads to a contradiction and establishes a° = 6° . As 
in (a) it now follows that a° = 6° = a* A must hold, thus also establishing 
uniqueness. The last claim is obvious in view of the construction of a* A . I 

Proof of Proposition [6j Let 

c = liminf inf P n . e (-d <rf n l (6-6)<d 



> 
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By definition of c, we can find a subsequence rik and elements 9 Uk G K such that 

f-d < rj-HB - 9 nk ) <d)^c 



for k —> oo. Now, by Theorem 17 (for = 9 H ), Theorem 18 (for 9 = 9 S ), 
and Remark 12 in Potscher and Leeb (2009), and by Theorem 6 (for 9 = 9a) 
and Remark 7 in Potscher and Schneider (2009), any accumulation point of the 
distribution of i]~^(9 — 9 nk ) with respect to weak convergence is a probability 
measure concentrated on [—1,1]. Since d > 1, it follows that c = 1 must 
hold, which proves the first claim. We next prove the second claim. In view 
of Theorem 17 (for 9 = 9 h) and Theorem 18 (for 9 — 9s) in Potscher and 
Leeb (2009), and in view of Theorem 6 (for 9 = 9 a) in Potscher and Schneider 
(2009) it is possible to choose a sequence 9 n G K such that the distribution 
of r/~ 1 (9 — 9 n ) converges to point mass located at one of the endpoints of the 
interval [—1, 1]. But then clearly 

Pn,e n (-d<r 1 - 1 (9-9 n ) <d) -+0 

for d < 1 which implies the second claim. ■ 

Proof of Theorem [9} We prove the result for the closed interval. Inspection 
of the proof together with Remark 0] then gives the result for the open and 
half-open intervals. 

Step 1: Observe that for every s > and n > 2 we have from the above 
formulae for ph,u that 

lim p H n (9; 1, sr] n , sa n , sa n ) = $(n 1/2 sa„) - <5>(-n 1/2 sa n ). 

6— >oo 

By the dominated convergence theorem it follows that for 9 — » oo 

y* oo 

P n ,e (9 £ Eh,u) = / PH.n 

(9; 1, sr) n , sa n , sa n ) h n (s)ds 
I $(n 1/2 sa„) - $(-n 1/2 sa„) h n (s)ds 



Hence, 



T n -i{n 1/2 a n ) - T„_i(-n 1/2 a„). 



inf P nfi (9 e C H .n) < lim p H ,n (0; 1, r) n , a„, a n ) = $(n 1/2 a„) - $(-n 1/2 a„) 

wGK 9—>-oo 

and 

inf P„, e {9 G £ H ,n) < T n _ 1 (n 1 / 2 a„)-T„_ 1 (-n 1 / 2 a n ) < $(n 1 / 2 a„)-$(-n 1 / 2 a„), 

(26) 

the last inequality following from well-known properties of T n _i, cf. Lemma [T4l 
below. This proves the theorem in case n 1//2 a„ — > for n — ► cxd. 
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Step 2: For every s > and n > 2 we have from ^ 



inf P ni e (6 G C H) „) = inf 



max 



$(n 1 / 2 a„) - $(-n 1 / 2 (a„ - ryj), ol (27) 



and 



inf p ffi „ (6>; 1, s?7„, sa„, sa„) = max <l>(n 1/2 sa n ) - $(n 1/2 (-sa„ + s?7„)), 



Furthermore, 

inf P n! e (6» G Sjj >n ) > / inf p H . n (0; 1, s?y n , sa„, sa n ) h n (s)ds 

$(n 1/2 sa„) - $(n 1/2 (-sa„ + S77 n )), h n (s)ds 

) * 

$(n 1/2 sa„) - $(n 1/2 (-sa„ + s^J) h n {s)ds, 



max 



= max 



Ifn 1 / 2 / 



T n -i(n 1/2 a n ) - T„_i(-n 1/2 (a„ -?7 n )),0 



oo, then the far right-hand sides of (|2"T|) and (j2"8")l converge to 



$ — T n _i -> as ii -> oo by Polya's Theorem and since n ' a n > 



1, since 

in}l 2 (a n — rj n ). This proves the theorem in case n 1 / 2 (a n — r/ n ) — > oo. 

Step 3: If n 1 l 2 r\ n 0, then ([27]) and the fact that $ is globally Lipschitz 
shows that infggRP,,,^ {0 G Cff jTl ) differs from $(n 1 / 2 a„) — $(— n 1 / 2 a„) only by 
a term that is o(l). Similarly, ((26|), ([28l), the fact that ||$ - T^iH^ ^ as 
n — > oo by Polya's theorem, and the global Lipschitz property of $ show that the 
same is true for infggR P n ^ {9 G Eh,u), proving the theorem in case n 1 ^ 2 rj n — > 0. 

Step 4: By a subsequence argument and Steps 1-3 it remains to prove the 
theorem under the assumption that n 1 / 2 ^ and n 1 l 2 r\ n are bounded away from 
zero by a finite positive constant ci, say, and that n 1 / 2 (a„ — rj n ) is bounded from 
above by a finite constant C2, say. It then follows that a n /rj n is bounded by a 
finite positive constant C3, say. For given e > set 9 n (e) — a n (l + 2c(e)n~ 1 / 2 ) 
where c(s) is the constant given in Lemma 1131 We then have for s G [1 — 
c(e)n~ 1 / 2 ,l + c(e)n~ 1 / 2 } 

sa„ < e n {e) <s(ri n + a„) 

whenever n > no(c(e), C3). Without loss of generality we may choose no(c(e), C3) 
large enough such that also 1 — c(e)n -1 / 2 > holds for n > no(c(e), C3). Con- 
sequently, we have (observing that max(0,a;) has Lipschitz constant 1 and $ 
has Lipschitz constant (27r) -1 / 2 ) for every s G [1 — c(e)n -1 / 2 , 1 + c(e)n -1 / 2 ] and 



(28) 
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< 



n > uo(c(e),c 3 ) 

\PH, n (O n (e); 1, sr? n , sa n , sa„) 

PH,n (fin 

max(0, ^(n 1 / 2 ^) - ^(^(-^(e) + sr?„))) - max(0, ^(n 1 / 2 ^) - ^(^(-^(e) + rj n ))) 

$(n 1/2 S a„) - $(n 1 / 2 (-0„( £ ) + «,„))] - [$(n 1/2 a„) - $(n 1 / 2 (-0„(e) + ?j n ))] 
< (2vr)- 1 / 2 n 1 /2 (an + Vn) \ s -l\< (2vr)- 1 / 2 C ( £ )(a„ + r?„) < (2n)- 1 / 2 c(e)(c 3 + l) Vn . 

It follows that for every n > no(c(e), C3) 

/•OO 

inf / p Hin ($;l,sr) n ,sa n ,sa n )h n (s)ds 

6»GR Jq 



< / PH,n{O n (e);l,sri n ,sa n ,sa n )h n (s)ds 
Jo 

»l+c(e)n~ 1 '' 2 

m.n (#n(e); 1, S7j n , sa„, sa„) h n (s)ds 

'l-c(e)n- 1 /2 

+ / p H , n {6 n (s)'A, sr] n , sa n , sa n ) h n (s)ds 

J{s:|s-l|>c(e)n- 1 /2} 
= B\+ B2. 

Clearly, < B2 < e holds, cf. Lemma [T31 and for B\ we have 

\B\ ~PH,n (0n(e); a n> a n)l 

< / b«,n (^n(e); l,ST?„,sa n ,sa n ) -PH.n (^n(e); l,r) n ,a n ,a n )\ h n (s)ds 

Jl-c(e)n- 1 /2 

< (2 7 r)- 1 / 2 C ( £ )(c 3 + l)77„+e 
for n > rio(c(e), C3). It follows that 



+ £ 



inf / p H , n (0;l,sr] n ,sa n ,sa n )h n (s)ds 

SGRJq 

< PH,n (On(e); 1, ??„, «n, «n) + (27r)~ 1/2 c(E)(c 3 + 1)?7„ + 2e 
holds for n > n (c(e), C3). Now 

PH,„(6» n (e);l,r?„,a n ,a n ) = max(0, $(n 1/2 a„) - $(n 1/2 (-6»„(e) + 

= max(0, $(n 1 / 2 a„) - $(n 1/2 (-a„(l + 2c{e) n - 1 ' 2 ) + 77J)). 

But this differs from infg e R P n ,e (0 G Ch,u) =max(0,$(n 1 / 2 a n )-$(n 1 / 2 (-a rl + 
rj n ))) by at most 

$(n 1 / 2 (-a„ + ??„)) - $(n 1 / 2 (-a„(l + 2c(e)n- 1 / 2 ) + 77J) 
< (2^)- 1 / 2 2c(e)a„ < (27r)- 1 / 2 2c(£)c 3 ?7 n . 
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Consequently, for n > no(c(e),Cs) 

POO 

inf P nfi {0 e E H , n ) = inf / p ff ,„ (0; 1, S7] n ,sa n , sa n ) h n (s)ds 

< max(0, $(n 1/2 a„) - ^n^-On + ?7„))) + (27r)- 1 / 2 c( £ )(3c 3 + l)r? n + 2e 
- inf P„, e (0 G + (2 7 r)- 1 / 2 c(£)(3c 3 + l) Vn + 2e. 

On the other hand, 

/>oo 

inf P nfi {0 e -Ejj.n) = inf / Pi?,„ (0; 1, sr] n , sa n , sa n ) h n (s)ds 

> / inf p H) n (9; 1, S7/ n , sa n , sa n ) h n (s)ds 
Jo 0£R 

max(0, §{n 1/2 sa n ) - $(n 1/2 s(-a„ + ■q n )))h n {s)ds 

= max(0,T„_i(n 1/2 a„) - T„_i(n 1/2 (-a„ + r/ n ))) 

> max(0, $(n 1 / 2 a„) - $(n 1 / 2 (-a„ + 77J)) - 2 ||$ - T^J^ 
= igP n ,j(dGC H , n )-2p-T n _ 1 || 00 . 

Since 77 ra — > and ||$ — T n _i || — >• for n — > oo and since e was arbitrary the 
proof is complete. ■ 

Proof of Theorem HOt We prove the result for the closed interval. Inspection 
of the proof together with Remark [U then gives the result for the open and 
half-open intervals. 

Step 1: Observe that for every s > and n > 2 we have from (fTl))) that 

lim p A ,n {0; 1, sr) n , sa n , sa n ) = $(n 1/2 sa„) - $(-n 1/2 sa n ). 

0—>oo 

Then exactly the same argument as in the proof of Theorem [9] shows that 
inf eeR P n fi (9 G CU,n) as well as inf e6R P n< g (9 G £U,n) converge to zero for n 
oo if n 1//2 a n — > 0, thus proving the theorem in this case. For later use we note 
that this reasoning in particular gives 

inf P nfi (9 G E A , n ) < T„_ 1 (n 1 / 2 a„)-T„_ 1 (-n 1 / 2 a„) < $(n 1 / 2 a„)-$(-n 1 / 2 a„). 

(29) 

Step 2: By Proposition [3] we have for every s > and n > 1 

inf (0; 1, sry„, sa„, sa„) = ^(n 1/2 sy / a 2 l + rfo) - <J>(n 1/2 s(-a„ + rj n )). 

pGR 

Arguing as in the proof of Theorem |9] we then have 

inf P n>e (9 G CU,n) = inf 

= $(n 1 /2 v ^^2)_$ (n V2 ( _ an + r? j ) (30) 
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and 

inf P n . e (0 e E A , n ) > / inf PA, n (0; 1, s?? n , sa„, sa„) h n (s)ds 

= T n -An 1/2 Vrt + vl) - T„-i(n 1/2 (-a„ + »? n ))(31) 

If n 1 / 2 (a„ — r/„) — > oo, then the far right-hand sides of (|3"0|) and (|3ip con- 
verge to 1, since ||$ — r n _i|| — » as n — )• oo by Polya's Theorem and since 
n 1 / 2 ^/a 2 + ?y2 > n i/2 a ^ _^ an( j n i/2j-_ ftn _|_ _^ — oo. This proves the 
theorem in case n 1 / 2 (a„ — rj n ) —> oo. 

Step 3: Analogous to the corresponding step in the proof of Thcorcm[HJ using 
(SOP. I|2"5]). dSJ), and additionally noting that < n x l 2 ^/a 2 n + ?7 2 - n 1/2 a„ < 
n 1 / 2 ^, the theorem is proved in the case n 1 ^ 2 r/ n — > 0. 

Step 4: Similar as in the proof of Theorem [9] it remains to prove the theorem 
under the assumption that n 1 / 2 ^ > c\ > 0, n 1 l 2 r\. n > ci, and that n 1 / 2 (a„ — 
^n) < c 2 < oo. Again, it then follows that < a n /rj n < cj, < oo. For given 
e > set 9 n (e) = a n (l + 2c(e)n~ 1 / 2 ) where c(e) is the constant given in Lemma 
[T3l We then have for s G [1 - c(e)n -1 / 2 . 1 + c(e)7J~ 1/2 ] 

sa n < 9 n (e) 

for all n. Choose no(c(e)) large enough such that 1 — c(e)n~ 1 / 2 > 1/2 holds 
for n > n (c(e)). Consequently, for every s G [1 — c(£)n" 1 / 2 ,l + c(e)n -1 / 2 ] 
and n > rio(c(e)) we have from (|T6|) (observing that $ has Lipschitz constant 

(27T)- 1 / 2 ) 

|PA,n(^n(e); 1 ; s-q n ,sa n , sa n ) 

< (2vr)- 1 / 2 n 1 /2 (| s _ i| a „ + | v^(i) + sa„) 2 /4 + s 2 ^ 2 - y/(0 n (e) + a«) 2 /4 + ^ 

V(0„(e) - sa„) 2 /4 + s 2 ?? 2 - y/(B n (e) - a„) 2 /4 + t? 2 |) . 

We note the elementary inequality \x x l 2 — y 1 ^ 2 \ < 2~ 1 z -1 / 2 \x — y\ for posi- 
tive x, y, z satisfying min(x,y) > z. Using this inequality with z = (1 — 
c(e)n _1 / 2 ) 2 ry 2 twice, we obtain for every s 6 [1 — c(e)n -1 / 2 , 1 + c(e)n -1 / 2 ] and 
n > no(c(e)) 

|pA,n(^n(e); lj sT] n ,sa n , sa n ) (e); l,»7„,a„,a„) 

< (2 7 r)- 1 / 2 n 1 / 2 | S - 1| (a n + [(1 - cfcjn- 1 / 2 ) 2 ^] ^ [0„(e)a„/2 + {s + 1) ((a 2 /4) + t? 2 )] 

Since 1 — c(e)n -1 / 2 > 1/2 for n > n (c(e)) by the choice of n (c(e)) and 
since a„/rj n < C3 we obtain 



\pA,n(0 n (e); 1, sr) n , sa„, sa„) PA,n [yn 

< (2^)- 1 / 2 c( £ ) (on + [a 2 + (5/2)((a 2 /4) + V 2 n )]) 

< (2^)- 1 / 2 c(e) (cs + (13/4)c 2 + 5) r, n = c 4 (e)?7 n 



(32) 
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for every n > n (c(e)) and s 6 [1 — c{e)n 1 / 2 , 1 + c(e)n - 1 / 2 ]. 
Now, 

/•OO 

inf / pA,n(8;^,sr] n ,sa n ,san)h n (s)ds 



< I PA,n(8 n (£yA,sr] n ,sa n ,sa n )h n (s)ds 

*l+c(e)n- 1/2 



l-e(e)n-V2 



PA,n{°n(£); 1, S^n, Sd n , Sa n )h n (s)d,S 



PA,n (@n (e); 1, si] n , sa„, sa n )h n (s)ds 

|s-l|>c(e)n- 1 /2 

=: B\ + i?2- 

Clearly, < B2 < e holds by the choice of c(e), see Lemma Q21 For B\ we have 
using (02]) 

\B\ -pA,n{6n(s); 1 
r-l+c^n- 1 ' 2 

< / |px,n(^n(£); 1, sr] n , sa„, sa„) - pA,n{Q n (e); 1, ??„, a„ , a„) |/i„(s)ds + e 

Jl-c(e)n- 1 / 2 

< c 4 (e)r] rl +e 

for n > n (c(e)). It follows that 



inf / PA,n(0] l,sr) n ,sa n ,sa n )h n (s)ds 

9GR Jq 

< PA,n(O n {e)] 1 )+c 4 (e)77„ + 2£ 

holds for n > no(c(e)). Furthermore, the absolute difference between PA,n(fin(£)] 1, »)„, 
and infggR P„^ (6* S C^n) can be bounded as follows: Using Proposition[3l (fT6|) . 
observing that $ has Lipschitz constant (27r)~ 1 / 2 , and using the elementary in- 
equality noted earlier twice with z — rj^ we obtain 

(s); l,rj n ,a n ,a„) - $ 
< {2ti)- 1 ' 2 7i 1 ' 2 -a n c(e)n-^ 2 + ^/a 2 (1 + c{e)n-^ 2 ) 2 + rft - V 'a* + t? 2 
+ (2tt)- 1 /V/ 2 v /( an c(e)n-i/2)2 + ^ _ yj (a n c(e)n-^ + Vn ) 
(27T)- 1 / 2 (2a„c( £ ) + (2 ?7 J- 1 a 2 (2c(e) + C ( £ ) 2 n- 1 / 2 )) 



< 



< 



(27T)- 1 / 2 (2c 3C (£) + 2- 1 c 2 (2 C ( £ ) + c( £ ) 2 )) r, n = c 5 (e) Vn . 
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Consequently, for n > n (c(e)) 



inf / p A . n (6;l,sj] n ,sa n ,sa n )h n (s)ds 
ees, j 



< Hn 1/2 V< + Vi) Hn 1/2 {-a n + Vn )) 
+ (c 4 (e)+c 5 (e))r] n + 2e. 



On the other hand, 

poo 

inf / p A . n (6;l,sT] n ,sa n ,sa n )h n (s)ds 

/•OO 

- / M PA.n(°; I, sr] n , sa n , sa n )h n (s)ds 

Jo SeR 

§{n l ' 2 s^al + r,l) - ^(n^ 2 s(-a n + Vn ))] h n (s)ds 



> $(n 1/2 V4+^) - Hn 1/2 {-a n + Vn )) - 2||$ - T^U*,. 

Since r/„ — > and ||$ — Tn-iHoo — > for n — > oo and since e was arbitrary the 
proof is complete. ■ 



Lemma 13 Suppose a = 1. T/ien /or every £ > i/iere exists a c = c(e) > 
such that 

h n (s)ds > 1 — £ 

' max(0,l- cn -1 / 2 ) 

/toZds /or ewer?/ n > 2. 

Proof. By the central limit theorem and the delta-method we have that 
n 1 / 2 (6- — 1) converges to a normal distribution. It follows that n 1 / 2 (6- — 1) 
is (uniformly) tight. In other words, for every e > we can find a real number 
c > such that for all n > 2 holds 



Pr( n 1/2 (6--l) <c) > l-£. 



Lemma 14 Suppose n > 2 and ir > y > 0. T/ien 

T n -l(x) < $(x) 

and 

T„_i(a; -y)- T n _ t {-x - y) < §{x - y) - <f>(-x - y). 
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Proof. The first claim is well-known, see, e.g., Kagan and Nagaev (2008). The 
second claim follows immediately from the first claim, since by symmetry of $ 
and T„_i we have 

$(x - y) - $(-a; - y) - (T„_i(x -y)- T n - X (-x - y)) 
= [<b(x -y)- T n _i(a; - y)} + + y) - T„-i(x + y)] > 0. 
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